<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="$lang$" xml:lang="$lang$"$if(dir)$ dir="$dir$"$endif$>
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <link rel="shortcut icon" type="image/x-icon" href="favicon.ico">

  <!-- Add Open Graph meta tags for share image -->
  <meta property="og:image" content="https://github.com/natolambert/rlhf-book/blob/main/images/rlhf-book-share" />
  <meta property="og:image:width" content="1920" />
  <meta property="og:image:height" content="1080" />

  <!-- <meta property="og:title" content="$if(title-prefix)$$title-prefix$ – $endif$$pagetitle$" /> -->
  <meta property="og:title" content="RLHF Book by Nathan Lambert" />
  <meta property="og:description" content="The Reinforcement Learning from Human Feedback Book" />
  <meta property="og:url" content="https://rlhfbook.com" />

  $for(author-meta)$
  <meta name="author" content="$author-meta$" />
$endfor$
$if(date-meta)$
  <meta name="dcterms.date" content="$date-meta$" />
$endif$
$if(keywords)$
  <meta name="keywords" content="$for(keywords)$$keywords$$sep$, $endfor$" />
$endif$
  <!-- <title>$if(title-prefix)$$title-prefix$ – $endif$$pagetitle$</title> -->
   <!-- SEO and Open Graph titles -->
  <title>
    RLHF Book by Nathan Lambert
  </title>

  <style>
    $styles.html()$
    $style.css()$
  </style>
$for(css)$
  <link rel="stylesheet" href="$css$" />
$endfor$
$if(math)$
  $math$
  <style>
    /* Target all possible MathJax display containers */
    .MathJax_Display, .MJXc-display, .math.display, mjx-container[jax="CHTML"][display="true"], mjx-container[jax="SVG"][display="true"] {
      overflow-x: auto;
      max-width: 100%;
      padding: 0.5em 0;
    }
  </style>
$endif$
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
$for(header-includes)$
  $header-includes$
$endfor$

<!-- custom js nav -->
<script src="nav.js" defer></script>

</head>
<body>
$for(include-before)$
$include-before$
$endfor$
$if(title)$
<header id="title-block-header">
<h1 class="title">$title$</h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$
<navigation-dropdown expanded="true"></navigation-dropdown>

$if(abstract)$
<div class="abstract">
  <h2>Abstract</h2>
  $abstract$
</div>
$endif$
</header>
$endif$
<body>
  <section id="changelog" style="padding: 20px; text-align: center;">
    <h2>Changelog</h2>
    <p><strong>14 Apr. - 16 Apr. 2025 </strong>: Finish v0. Overoptimization, open questions, etc. </p>
    <p><strong>6 Apr. - 12 Apr. 2025.</strong>: Evaluation section </p>
    <p><strong>28 Mar. - 5 Apr. 2025.</strong>: Research on RLHF x Product, cleaning, improving website, reasoning section </p>
    <p><strong>17 Mar. - 27 Mar 2025.</strong>: Improving policy gradient section, minor changes </p>
    <p><strong>6 Mar. - 16 Mar 2025.</strong>: Finish DPO, major cleaning </p>
    <p><strong>26 Feb. - 5 Mar 2025.</strong>: Start DPO chapter, improve intro </p>
    <p><strong>20-25 Feb. 2025</strong>: Improve SEO, add IFT chapter, minor edits </p>
    <p><strong>10-15 Feb. 2025</strong>: RM additions, preference data, cleaning, policy gradient finalization </p>
    <p><strong>8 Feb. 2025</strong>: RM additions, editing, cleaning </p>
    <p><strong>4 Feb. 2025</strong>: PPO and GAE </p>
    <p><strong>2 Feb. 2025</strong>: Added changelog, revamped introduction, </p>
  </section>
  <section id="acknowledgements" style="padding: 20px; text-align: center;">
    <h2>Acknowledgements</h2>
    <p>I would like to thank the following people who helped me directly with this project: Costa Huang, (and of course Claude). Indirect shout-outs go to Ross Taylor, Hamish Ivison, John Schulman, Valentina Pyatkin, Daniel Han, Shane Gu, Joanne Jang, and others in my RL sphere.</p>
    <p>Additionally, thank you to the <a href="https://github.com/natolambert/rlhf-book/graphs/contributors">contributors on GitHub</a> who helped improve this project.</p>
  </section>
  <footer style="padding: 20px; text-align: center;">
    <hr>
    Citation <br>
    <div style="text-align: left; font-size: small; color: #888;">
      @book{rlhf2024,<br>
      &nbsp;&nbsp;author = {Nathan Lambert},<br>
      &nbsp;&nbsp;title = {Reinforcement Learning from Human Feedback},<br>
      &nbsp;&nbsp;year = {2024},<br>
      &nbsp;&nbsp;publisher = {Online},<br>
      &nbsp;&nbsp;url = {https://rlhfbook.com},<br>      }
    </div>
    <div>
      <a href="https://github.com/natolambert/rlhf-book" target="_blank">
        <img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" alt="GitHub" style="width: 40px; height: 40px;">
      </a>
      <!-- Add more social links here -->
    </div>
    <p>&copy; 2024 RLHF Book Team</p>
  </footer>  
</body>
</html>
